Needles and Haystacks: A Search Engine for Personal Information Collections
نویسندگان
چکیده
Information retrieval systems can be partitioned into two main classes: large-scale systems that make use of an inverted index or some other auxiliary data structure, intended for massive volumes of data; and the small-scale systems based upon sequential pattern matching that most computer users employ when hunting for missing email and news items. In this paper we describe a hybrid approach that offers the ranked queries and similarity matching of a genuine information retrieval system, but does so without any need for an index to be precomputed. This software tool, which we call seft, offers performance that in a retrieval effectiveness sense matches conventional information retrieval systems, and in a resource efficiency sense, while considerably slower than grep-like tools, is fast enough to be useful on hundreds of megabytes of text.
منابع مشابه
Guest Editors' Introduction: Information Discovery--Needles and Haystacks
For thousands of years, people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information in electronic form — and finding useful needles in the resulting haystacks has since become one of the most important problems in information management. Many systems exist to help users navigate the considerable...
متن کاملLooking for a Haystack Selecting Data Sources in a Distributed Retrieval System
The Internet contains billions of documents and thousands of systems for searching over these documents. Searching for a useful document can be as difficult as the proverbial search for a needle in a haystack. Each search engine provides access to a different collection of documents. Collections may be large or small, focused or comprehensive. Focused collections may be centered on any possible...
متن کاملThe Web in 2010: Challenges and Opportunities for Database Research
The impressive advances in global networking and information technology provide great opportunities for all kinds of Web-based information services, ranging from digital libraries and information discovery to virtual-enterprise workflows and electronic commerce. However, many of these services still exhibit rather poor quality in terms of unacceptable performance during load peaks, frequent and...
متن کاملThe Web in 2010 : Challenges and
The impressive advances in global networking and information technology provide great opportunities for all kinds of Web-based information services, ranging from digital libraries and information discovery to virtual-enterprise workkows and electronic commerce. However, many of these services still exhibit rather poor quality in terms of unacceptable performance during load peaks, frequent and ...
متن کاملA Strategy for Evaluating Search of “Real” Personal Information Archives
Personal information archives (PIAs) can include materials from many sources, e.g. desktop and laptop computers, mobile phones, etc. Evaluation of personal search over these collections is problematic for reasons relating to the personal and private nature of the data and associated information needs and measuring system response effectiveness. Conventional information retrieval (IR) evaluation...
متن کامل